Multireader, multicase receiver operating characteristic analysis: an empirical comparison of five methods.

نویسندگان

  • Nancy A Obuchowski
  • Sergey V Beiden
  • Kevin S Berbaum
  • Stephen L Hillis
  • Hemant Ishwaran
  • Hae Hiang Song
  • Robert F Wagner
چکیده

RATIONALE AND OBJECTIVES Several statistical methods have been developed for analyzing multireader, multicase (MRMC) receiver operating characteristic (ROC) studies. The objective of this article is to increase awareness of these methods and determine if their results are concordant for published datasets. MATERIALS AND METHODS Data from three previously published studies were reanalyzed using five MRMC methods. For each method the 95% confidence intervals (CIs) for the mean of the readers' ROC areas for each diagnostic test, the P value for the comparison of the diagnostic tests' mean accuracies, and the 95% CIs for the mean difference in ROC areas of the diagnostic tests were reported. RESULTS Important differences in P values and CIs were seen when using parametric versus nonparametric estimates of accuracy, and there were the expected differences for random-reader versus fixed-reader models. Controlling for these differences, the Dorfman-Berbaum-Metz (DBM), Obuchowski-Rockette, Beiden-Wagner-Campbell, and Song's multivariate Wilcoxon-Mann-Whitney (WMW) methods gave almost identical results for the fixed-reader model. For the random-reader model, the DBM, Obuchowski-Rockette, and Beiden-Wagner-Campbell methods yielded approximately the same inferences, but the CIs for the Beiden-Wagner-Campbell method tend to be broader. Ishwaran's hierarchical ROC sometimes yielded significance not found with other methods. Song's modification of DBM's jack-knifing algorithm sometimes led to different conclusions than the original DBM algorithm. CONCLUSION In choosing and applying MRMC methods, it is important to recognize: (1) the distinction between random-reader and fixed-reader models, the uncertainties accounted for by each, and thus the level of generalizeability expected from each; (2) assumptions made by the various MRMC methods; and (3) limitations of a five- or six-reader study when the reader variability is great.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generalized Roe and Metz receiver operating characteristic model: analytic link between simulated decision scores and empirical AUC variances and covariances.

Modeling and simulation are often used to understand and investigate random quantities and estimators. In 1997, Roe and Metz introduced a simulation model to validate analysis methods for the popular endpoint in reader studies to evaluate medical imaging devices, the reader-averaged area under the receiver operating characteristic (ROC) curve. Here, we generalize the notation of the model to al...

متن کامل

Exploration of Analysis Methods for Diagnostic Imaging Tests: Problems with ROC AUC and Confidence Scores in CT Colonography

BACKGROUND Different methods of evaluating diagnostic performance when comparing diagnostic tests may lead to different results. We compared two such approaches, sensitivity and specificity with area under the Receiver Operating Characteristic Curve (ROC AUC) for the evaluation of CT colonography for the detection of polyps, either with or without computer assisted detection. METHODS In a mul...

متن کامل

Influence of study design in receiver operating characteristics studies: sequential versus independent reading.

Observer studies to assess new image processing devices or computer-aided diagnosis techniques are often performed, but little is known about the effect of the study design on observer performance results. We investigated the effect of the sequential and independent reading design on observer study results with respect to reader performance and their statistical power. For this we performed an ...

متن کامل

Presentation of similar images for diagnosis of breast masses on mammograms: analysis of the effect on residents

We have been developing a computerized scheme for selecting visually similar images that would be useful to radiologists in the diagnosis of masses on mammograms. Based on the results of the observer performance study, the presentation of similar images was useful, especially for less experienced observers. The test cases included 50 benign and 50 malignant masses. Ten observers, including five...

متن کامل

Receiver Operating Characteristic (ROC) Curve Analysis for Medical Diagnostic Test Evaluation

This review provides the basic principle and rational for ROC analysis of rating and continuous diagnostic test results versus a gold standard. Derived indexes of accuracy, in particular area under the curve (AUC) has a meaningful interpretation for disease classification from healthy subjects. The methods of estimate of AUC and its testing in single diagnostic test and also comparative studies...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Academic radiology

دوره 11 9  شماره 

صفحات  -

تاریخ انتشار 2004